{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Variational Autoencoder\n",
    "===============\n",
    "\n",
    "In this tutorial I will introduce the Variational Autoecoder (VAE). Let's start from the name. The name is a mixture between [Variational Bayesian Methods](https://en.wikipedia.org/wiki/Variational_Bayesian_methods) and [Autoencoder](https://en.wikipedia.org/wiki/Autoencoder).\n",
    "\n",
    "I will use the term **code** that is derived from information theory and is tighlty connected to the model.\n",
    "The encoder produces the code, wheread the decoder read the code.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "Training a VAE\n",
    "------------------\n",
    "\n",
    "We define a **Gaussian distribution** through a mean vector $\\mu$ and a covariance matrix $\\Sigma$, as follows:\n",
    "\n",
    "$$\\mathcal{N}(\\mu, \\Sigma)$$\n",
    "\n",
    "The standard form is often replaced with a simplified form. We call this simplified form an **isotropic Gaussian** and we defined it by a covariance matrix that is expressed in this form:\n",
    "\n",
    "$$\\Sigma = \\sigma^{2} I$$\n",
    "\n",
    "The advantage here is that the number of free parameters does not growh anymore quadratically like in the standard Gaussian, and this is less computationally expensive. However, using the isotropic form we are implying that the distribution has independent dimensions. In a Cartesian plane the samples drawn from an isotropic Gaussian produce a circular cloud."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The reparametrization trick\n",
    "--------------------------------\n",
    "\n",
    "An important technique used for training a VAE is the **reparameterization trick**. When mixing gradient descent and random variable we must always be aware of the fact the [automatic differentiation](https://en.wikipedia.org/wiki/Automatic_differentiation) needs a deterministic path in order to estimate the gradients. When one of the node of the graph is a random variable produced by a probability distribution, the flow is interrupted.\n",
    "\n",
    "A nice post on [medium](https://medium.com/@llionj/the-reparameterization-trick-4ff30fe92954) regarding this trick. Another [post](http://blog.fastforwardlabs.com/2016/08/22/under-the-hood-of-the-variational-autoencoder-in.html) by DeepMind. The [slides](http://dpkingma.com/wordpress/wp-content/uploads/2015/12/talk_nips_workshop_2015.pdf) of the NIPS tutorial are also useful."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
